Advances in Automatic Text Summarization

نویسندگان

  • Inderjeet Mani
  • Mark T. Maybury
  • Mark Sanderson
چکیده

It has been said for decades (if not centuries) that more and more information is becoming available and that tools are needed to handle it. Only recently, however, does it seem that a sufficient quantity of this information is electronically available to produce a widespread need for automatic summarization. Consequently, this research area has enjoyed a resurgence of interest in the past few years, as illustrated by a 1997 ACL Workshop, a 1998 AAAI Spring Symposium and in the same year SUMMAC: a TREC-like TIPSTER-funded summarization evaluation conference. Not unexpectedly, there is now a book to add to this list: Advances in Automatic Summarization, a collection of papers edited by Inderjeet Mani and Mark T. Maybury and published by The MIT Press. Half of it is a historical record: thirteen previously published papers, including classics such as Luhn's 1958 word-counting sentence-extraction paper, Edmundson's 1969 use of cue words and phrases, and Kupiec, Pedersen, and Chen's 1995 trained summarizer. The other half of the book holds new papers, which attempt to cover current issues and point to future trends. It starts with a paper by Karen Sp~irck Jones, which acts as an overall introduction. In it, the summarization process and the uses of summaries are broken down into their constituent parts and each of these is discussed (it reminded me of a much earlier Sp~rck Jones paper on categorization [1970]). Despite its comprehensiveness and authority, I must confess to finding this opener heavy going at times. The rest of the papers are grouped into six sections, each of which is prefaced with two or three well-written pages from the editors. These introductions contain valuable commentary on the coming papers--even pointing out a possible flaw in the evaluation part of one. The opening section holds three papers on so-called classical approaches. Here one finds the oft-cited papers of Luhn, Edmundson, and Pollock and Zamora. As a package, these papers provide a novice with a good idea of how basic summarization works. My only quibble was in their reproduction. In Luhn's paper, an article from Scientific American is summarized and it would have been beneficial to have this included in the book as well. Some of the figures in another paper contained very small fonts and were hard to read; fixing this for a future print run is probably worth thinking about. The next section holds papers on corpus-based approaches to summarization, starting with Kupiec et al.'s paper about a summarizer trained on an existing corpus of manually abstracted documents. Two new papers building upon the Kupiec et al. work follow this. Exploiting the discourse structure of a document is the topic of the next section. Of the five papers here, I thought Daniel Marcu's was the best, nicely describing summarization work so far and then clearly explaining his system, which is based on Rhetorical Structure Theory. The following section on knowledge-rich approaches to summarization covers such things as Wendy Lehnert's work on breaking

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A survey on Automatic Text Summarization

Text summarization endeavors to produce a summary version of a text, while maintaining the original ideas. The textual content on the web, in particular, is growing at an exponential rate. The ability to decipher through such massive amount of data, in order to extract the useful information, is a major undertaking and requires an automatic mechanism to aid with the extant repository of informa...

متن کامل

Systematic literature review of fuzzy logic based text summarization

Information Overloadrq  is not a new term but with the massive development in technology which enables anytime, anywhere, easy and unlimited access; participation & publishing of information has consequently escalated its impact. Assisting userslq    informational searches with reduced reading surfing time by extracting and evaluating accurate, authentic & relevant information are the primary c...

متن کامل

Biogeography-Based Optimization Algorithm for Automatic Extractive Text Summarization

    Given the increasing number of documents, sites, online sources, and the users’ desire to quickly access information, automatic textual summarization has caught the attention of many researchers in this field. Researchers have presented different methods for text summarization as well as a useful summary of those texts including relevant document sentences. This study select...

متن کامل

Handling Figures In Document Summarization

Some document genres contain a large number of figures. This position paper outlines approaches to diagram summarization that can augment the many well-developed techniques of text summarization. We discuss figures as surrogates for entire documents, thumbnails, extraction, the relations between text and figures as well as how automation might be achieved. The focus is on diagrams (line drawing...

متن کامل

An Evolutionary Algorithm for Automatic Summarization

This paper proposes a novel method to select sentences for automatic summarization based on an evolutionary algorithm. The algorithm explores candidate summaries space following an objective function computed over ngrams probability distributions of the candidate summary and the source documents. This method does not consider a summary as a stack of independent sentences but as a whole text, an...

متن کامل

TL;DR: Mining Reddit to Learn Automatic Summarization

Recent advances in automatic text summarization have used deep neural networks to generate high-quality abstractive summaries, but the performance of these models strongly depends on large amounts of suitable training data. We propose a new method for mining social media for author-provided summaries, taking advantage of the common practice of appending a “TL;DR” to long posts. A case study usi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000